Masking repeats while clustering ESTs

نویسندگان

  • Korbinian Schneeberger
  • Ketil Malde
  • Eivind Coward
  • Inge Jonassen
چکیده

A problem in EST clustering is the presence of repeat sequences. To avoid false matches, repeats have to be masked. This can be a time-consuming process, and it depends on available repeat libraries. We present a fast and effective method that aims to eliminate the problems repeats cause in the process of clustering. Unlike traditional methods, repeats are inferred directly from the EST data, we do not rely on any external library of known repeats. This makes the method especially suitable for analysing the ESTs from organisms without good repeat libraries. We demonstrate that the result is very similar to performing standard repeat masking before clustering.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RBR: library-less repeat detection for ESTs

MOTIVATION Repeat sequences in ESTs are a source of problems, in particular for clustering. ESTs are therefore commonly masked against a library of known repeats. High quality repeat libraries are available for the widely studied organisms, but for most other organisms the lack of such libraries is likely to compromise the quality of EST analysis. RESULTS We present a fast, flexible and libra...

متن کامل

In silico prediction of UTR repeats using clustered EST data

Clustering of EST data is a method for the non-redundant representation of an organisms transcriptome. During clustering of large amounts of EST data, usually some large clusters (>500 sequences) are created. Those can lead to iterative contig builds, consumation of lots of computing time and improbable exon alignments, which is unfavourable. In addition, these clusters sometimes contain transc...

متن کامل

EGassembler: online bioinformatics service for large-scale processing, clustering and assembling ESTs and genomic DNA fragments

Expressed sequence tag (EST) sequencing has proven to be an economically feasible alternative for gene discovery in species lacking a draft genome sequence. Ongoing large-scale EST sequencing projects feel the need for bioinformatics tools to facilitate uniform EST handling. This brings about a renewed importance for a universal tool for processing and functional annotation of large sets of EST...

متن کامل

Algorithms for the Analysis of Expressed Sequence Tags

A problem in EST clustering is the presence of repeat sequences. To avoid false matches, repeats have to be masked. This can be a time consuming process, and it depends on available repeat libraries. We present a fast and library independent method to eliminate this problem in the process of clustering. We demonstrate that the result is very similar to performing standard repeat masking before ...

متن کامل

A Fast Clustering System for a Huge Number of Nucleotide Sequences

Single pass sequences of mRNA, called ESTs, have been determined extensively. They have been accumulated in the dbEST database in GenBank. The number of ESTs in dbEST has become more than eight million in August 2002. By clustering and assembling ESTs, we can conduct the following analyses. First, we can obtain complete ORF sequences based on ESTs that are fragment sequences of mRNA and do not ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Nucleic Acids Research

دوره 33  شماره 

صفحات  -

تاریخ انتشار 2005